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1. INTRODUCTION 

Cosmetics are substances or preparations intended for use on the external parts of the human body 
(epidermis, hair, nails, lips and external genital organs) or on the teeth and mucous membranes of the mouth, 
especially to clean, perfume, change appearance and or improve body odor or protect or keep the body in 
good condition [1]. Cosmetic companies that provide various kinds of cosmetics will always provide various 
kinds of products and are stored in storage warehouses, so sometimes many products that are difficult to find 
quickly cannot even be found even though the product is still available in the warehouse. Cosmetic 
companies are required to find solutions in the management of beauty products (cosmetics) and find 
strategies that can increase business in the sales sector, especially to improve the best service. That way, 
cosmetic companies can run their business according to their needs so that they are more efficient and 
effective in their management, especially for their stock of goods. 

Data mining is an extension and branch of statistics data mining is a mining process or the discovery 
of new information that is done by looking for a certain pattern or rule from a number of data that 
accumulates and is said to be big data [2], [3]. Data mining can also be interpreted as a process in finding 
or exploring the value of data in the form of knowledge that has not been known manually, whose knowledge 
can be useful. It takes data mining techniques such as clustering using the K-means clustering algorithm [4]. 
In the research of Zhao et al. stated that the K-means procedure makes the process of grouping documents, 
searching for nearest neighbors, and grouping images simpler and converges to a much better local optima [5]. 
Another analysis states that among many clustering algorithms, the K-means clustering algorithm 
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is widely used because of its simple algorithm and fast convergence [6]. Clustering technique using the 
K-means clustering algorithm is a way of measuring proximity data between cosmetic products based 
on transactions that have occurred. As is known clustering in data mining can be used to analyze the 
grouping of sales transactions. In the concept of analysis carried out by analyzing the problems and needs 
in the problems discussed, such as determining the grouping of cosmetic sales transaction data, implementing 
an analysis that has been tested on the grouping of cosmetic sales data, then an assessment of the indicators 
of the cause of the problem will be carried out and at the last stage a system design will be carried out so that 
can solve the problem. The analysis of the management of cosmetic products in this study is to produce 
product data that is best seller, quite in demand and, less selling well so that it can provide prevention so that 
the accumulation of these products does not occur. 


2. RESEARCH METHOD 

Data mining is a mining process or the discovery of new information that is done by looking for 
a certain pattern or rule from a number of data that accumulates and is said to be big data [7], [8]. 
Data mining can also be interpreted as a series of processes in finding or exploring added value for data in the 
form of knowledge that has not been known manually, whose knowledge can be useful [9], [10]. Data mining 
is not a field that can be said to be new. Data mining is a development and branch of statistics [11], [12]. 
Therefore, data mining and statistics are closely related to each other. One of the things that makes it difficult 
to interpret data mining is the fact that data mining inherits many fields, aspects and techniques from other 
disciplines that have been established beforehand [13], [14]. 


2.1. K-means clustering 

K-means is one of the algorithms used in classifying data, K-means is also one of the algorithms 
in data mining science [15], [6], [16]. There are many approaches to grouping, one of which is to create rules 
that read very large data at many levels. In another approach, K-means also creates a set of functions 
to measure the properties of existing clusters [17], [18]. K-means is a method where the algorithm 
is distance-based which divides the data into several clusters and this algorithm only works on numeric 
attributes [19]-[21]. K-means is a distance-based clustering method in dividing data into clusters. 
This algorithm also works on numeric attributes. Besides, K-means is also included in partitioning clustering 
which separates data into separate parts so that the data is more grouped [22]. K-means is very popular and 
easy to use because it has the ability to group large data very quickly [23]. In the K-means method, each data 
must enter a certain cluster at a stage of the process, at the next stage of the process it can move to another 
cluster, so that in this method the data can be grouped with their respective clusters. 

In the K-means method, each data must belong to a certain cluster at a stage of the process, 
at the next stage of the process it can move to another cluster. The grouping of data using the K-means 
method is carried out using the following algorithm [24], [25]: 

— Set the desired number of clusters. 

— Place data into clusters randomly. 

— Calculate the cluster center (centroid/mean) of the data in each cluster. The location of the centroid of 
each group is taken from the average value (mean) of all data values for each feature. If M represents 
the number of data in a group, i represents the i-th feature in the group, and p represents the data 
dimension, then the equation to calculate the centroid of the i-th feature is (1). 


f 1 
ci = zba X (1) 


In (1) is carried out as many as p dimensions from i = / toi = p. 

— Allocate each data to the nearest centroid/average. There are several ways that can be done to measure 
the distance of the data to the center of the group, including Euclidean. The measurement of distance 
in Euclidean distance space can be found using (2). 


d = J (X1 — X2)* +(% — V)? (2) 


Re-allocate data to each group in the K-means based on the distance between the data and the centroid 
in each available group. Then the data is distributed accurately to the group that has the cetroid and the 
closest distance from the data. This data allocation is calculated using (3). 
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d={D(x,c)} 
Lainnya 


a= {10 (3) 


ail is the value of a member with point xi to the center of group c/, while d is the shortest distance from data 
xi to group K that has been compared, while c/ is the Ist centroid (center of the group). The functions and 
objectives used in the K-means method are determined based on the distance and the value of data 
membership in a group. Thus the function and purpose use (4) in calculating it. 


J = Xiz Lia Aic D(x, c) (4) 


n is the number of data, k is the number of groups, ail is the membership value of the data point xi to the 
group cl followed. a is O or 1. If the data is a member of a group, then the value ail = 1. Otherwise, the value 
ail = 0. Then we go back to stage 3, if it is found that there is data that is still moving groups or if there 
is a change in the centroid value above the specified threshold value, or there is a change in the function 
value and the objective is still above the predetermined threshold value. 


3. RESULTS AND ANALYSIS 

System algorithm is the steps taken by a system in processing and solving a problem. The following 
is a flowchart or flow of problem solving using the K-means method. This stage is carried out by applying 
the K-means algorithm with the formula [26], [27]: 





d(x, y)= ||x-y||= Jd, (xi — yi); i = 1,2,3,.....n 


The application of the number of clusters (K) is 3 clusters. After setting the number of clusters, determine the 
initial center point of the cluster (centroid). Some of the selected center points can be seen in Table 1. 


Table 1. Initial centroid data 
Centroid Product Productin Product sold Stock 


Centroid 1 Garnier 46 11 8 
Centroid2 L'oreal 57 65 9 
Centroid 3 Nivea 50 58 6 


Calculate the distance of the data to the centroid using the Euclidean formula, the data will be designated 
as a member of the closest cluster. Calculation of the distance (distance) between variables from each data 
sample with the centroid is shown in Table 2. 


Table 2. Cosmetics data 


Product Incoming product __ Product sold Stock 
Wardah 34 23 52 


— With centroid 1 (46, 11, 8) 
Distance between Wardah and point Cl 


=/¥" =1 (xi — yi)? = J (34 — 46)? + (23 — 11)? + (52 — 8)? = 47.15930449 


— With centroid 2 (57, 65, 9) 
Distance between Wardah and point C2 


=$? =1 (xi — yi)?= J (34 — 57} + (23 — 11)? + (52 — 8)? = 64.35837164 


— With centroid 3(50, 58, 6) 
Distance between Wardah and point C3 


=/¥" =1 (xi — yi)? = / (34 — 50)? + (23 — 58)? + (52 — 6)? = 59.97499479 
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Perform the same calculation process up to the 30th object. The results of the iteration 1 calculation 
can be seen in the Table 3. Where the closest distance is seen from the calculation of the closest to the center of 
the cluster. While within cluster variation (WCW) is the result of the power of the calculation of the closest 
distance to the center of the cluster. For more complete distance in each data row, the results are as in Table 3. 


Code 
A001 
A002 
A003 
A004 
A005 
A006 
A007 
A008 
A009 
A010 
A011 
A012 
A013 
A014 
A015 
A016 
A017 
A018 
A019 
A020 
A021 
A022 
A023 
A024 
A025 
A026 
A027 
A028 
A029 
A030 


Product 
Wardah 
Emina 
Maybelline 
Garnier 
Make Over 
Viva 
Nivea 
Purbasari 
Sariayu 
Pond's 
Rexona 
Y.O.U 
Marcks 
Pigeon 
Inez 
Pixy 
L'oreal 
Mustika Ratu 
Madam Gie 
M.A.C 
Revlon 
Citra 
Bioaqua 
Fair and Lovely 
La Tulipe 
Olay 
Silkygirl 
Marina 
Lakeme 
Safi 


Cl 
47.1593 
45.32108 
7.28011 
0 
54.01852 
42.80187 
47.21229 
43.0581 
44 
1 
54.09251 
42.09513 
47.42362 
43.01163 
44.55334 
0 
55.11806 
46.56179 
55.26301 
47.86439 
45.60702 
63.79655 
50.69517 
54.53439 
64.16385 
47.77028 
71.96527 
63.00794 
60.49793 
72.86975 


From Table 3. the membership can be as follows: 
— Cluster 1 = { Wardah, Maybelline, Garnier, Pond'S, Pixy, Citra, Bioaqua} 


— Cluster 2 = {L'oreal, Mustika Ratu, Madam Gie, M.A.C, Fair and Lovely, La Tulipe, Silkygirl, Marina, 


Safi} 


— Cluster 3 = {Emina, Make Over, Viva, Nivea, Purbasari, Sariayu, Rexona, Y.O.U, Marcks, Pigeon, 


Inez, Revlon, Olay, Lakeme} 
Perform a centroid update from the cluster results as follows: 


— Cluster 1 = average (Wardah, Maybelline, Garnier, Pond's, Pixy, Citra, Bioaqua) = (24.27587571, 


60.14604044, 53.97956857) 


—  Cluster2 = average (L'oreal, Mustika Ratu, Madam Gie, M.A.C, Fair and Lovely, La Tulipe, Silkygirl 


Table 3. Iteration 1 


C2 
64.35837 
11.22497 
50.32892 
55.11806 

12 
12.40967 
10.34408 

14.3527 
14.89966 
54.92722 

14 
17.72005 
9.110434 
15.68439 
10.81665 
55.11806 

0 
15.0333 
19.54482 


4.472136 
47.54997 
7.874008 
57.105 17 
47.24405 
54.49771 
56.93856 


Marina, Safi) = (59.03871667, 29.20977812, 32.78979822) 


—  Cluster3 = average (Emina, MakeOver, Viva, Nivea, Purbasari, Sariayu, Rexona, Y.O.U, Marcks, 


C3 
59.97499 
2.236068 
41.78516 
47.21229 
9.110434 
7.549834 

0 
4.582576 
5.385165 
47.13809 
10.34408 
8.774964 

2 
5.744563 
4.690416 
47.21229 
10.34408 
17.23369 

25 
17.49286 
8.774964 

65 
69.53416 
9.110434 

49.8999 
6.403124 
59.34644 
47.61302 
53.93515 
59.06776 


Cluster 
Cluster 1 
Cluster 3 
Cluster 1 
Cluster 1 
Cluster 3 
Cluster 3 
Cluster 3 
Cluster 3 
Cluster 3 
Cluster 1 
Cluster 3 
Cluster 3 
Cluster 3 
Cluster 3 
Cluster 3 
Cluster 1 
Cluster 2 
Cluster 2 
Cluster 2 
Cluster 2 
Cluster 3 
Cluster 1 
Cluster 1 
Cluster 2 
Cluster 2 
Cluster 3 
Cluster 2 
Cluster 2 
Cluster 3 
Cluster 2 


Pigeon, Inez, Revlon, Olay, Lakeme) = (47.24738, 15.35955926, 9.252238429) 


— The centroid value changes from the previous centroid value, then the algorithm continues to the next 


step. 


— Calculate the distance of the data to the centroid using the Euclidean formula, the data will be 


designated as a member of the closest cluster. 


Next, calculate iteration 2 as well as iteration 1 to get the same ratio value as the previous ratio value. 
After 3 iterations, the centroid value decreases from the previous centroid value, so the final result in Table 4. 
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Table 4. Grouping of cluster results 


Product CI C2 C3 Cluster 
Wardah 34 23 52 Cluster-1 
Maybelline 42 17 7  Cluster-1 
Garnier 46 ll 8 Cluster-1 
Ponds 47 ll 8 Cluster-1 
Pixy 46 ll 8  Cluster-1 
Citra 11 38 54 ~~ Cluster-1 
Emina 51 56 6 Cluster-2 
Make Over 45 65 9 Cluster-2 
Viva 54 53 10 Cluster-2 
Nivea 50 58 6 Cluster-2 
Purbasari 48 54 7 Cluster-2 
Sariayu 46 55 8  Cluster-2 
Rexona 43 65 9 Cluster-2 
Y.O.U 44 53 10 Cluster-2 
Marcks 52 58 6 Cluster-2 
Pigeon 46 54 #7 Cluster-2 
Inez 53 55 8 Cluster-2 
Loreal 57 65 9 Cluster-2 
Mustika Ratu 66 53 10 Cluster-2 
Madam Gie 75 58 6 Cluster-2 
M.A.C 67 54 7 Cluster-2 
Revlon 58 55 8 Cluster-2 
Fair and Lovely 53 65 11 + Cluster-2 
La Tulipe 58 53 55  Cluster-2 
Olay 54 58 11 Cluster-2 
Silkygirl 55 54 65 Cluster-2 
Marina 43 55 £53 ~ Cluster-2 
Lakeme 44 45 58 ~ Cluster-2 
Safi 52 56 65 Cluster-2 
Bioaqua 65 11 55 ~ Cluster-3 


3.1. Interpretation or evaluation 
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At this stage, in Table 5, it can be seen the results of the 3rd clustering using the K-means clustering 
algorithm to measure the best-selling products that must always be in stock. Then in Table 6 it can be seen 
the results of the second clustering using the K-means clustering algorithm to measure the quite in demand 
products that must always be in stock. The results of the first clustering with the application of the K-means 


algorithm showed several products that were less selling, as shown in Table 7. 


No Product Cluster Description 
1 Emina Cluster 2 Quite in demand 
2 Make Over Cluster 2 Quite in demand 
3 Viva Cluster 2 Quite in demand 
4 Nivea Cluster 2 Quite in demand 
5 Purbasari Cluster 2 Quite in demand 
6 — Sariayu Cluster2 Quite in demand 
7 Rexona Cluster 2 Quite in demand 
8 Y.O.U Cluster 2 Quite in demand 
9 — Marcks Cluster 2 Quite in demand 
10 Pigeon Cluster 2 Quite in demand 
11 Inez Cluster 2 Quite in demand 
12 Loreal Cluster 2 Quite in demand 
13 Mustika Ratu Cluster 2 Quite in demand 
14 Madam Gie Cluster 2 Quite in demand 
15 M.A.C Cluster 2 Quite in demand 
16 ~Revlon Cluster 2 Quite in demand 
17 Fair and Lovely Cluster2 Quite in demand 
18 La Tulipe Cluster 2 Quite in demand 
19 Olay Cluster 2 Quite in demand 
20 Silkygirl Cluster 2 Quite in demand 
21 Marina Cluster 2 Quite in demand 
22  Lakeme Cluster 2 Quite in demand 
23 Safi Cluster 2 Quite in demand 


Table 5. Best selling results 


No Product Cluster Description 
1 Bioaqua Cluster-3 Best sellers 


Table 6. Results are quite in demand 
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Table 7. Less selling 
No Product Cluster Description 


1 Wardah Cluster 1 Less selling 
2 Maybelline Cluster! Less selling 
3 Garnier Cluster 1 Less selling 
4 Ponds Cluster 1 Less selling 
5 Pixy Cluster 1 Less selling 
6 Citra Cluster 1 Less selling 


4. CONCLUSION 


Based on the results of data mining analysis with K-means can solve problems in classifying cosmetic 


product sales transaction data and find out which products must be in cosmetics stock so as to increase sales 
profits, K-means clustering algorithm for cosmetic clustering product sales transaction data is very effective 
in solve grouping problems. Analysis of the management of cosmetic products in this study is to produce data 
on products that are best sellers, are quite in demand and not in demand so that they can provide prevention 
so that the accumulation of these products does not occur. The K-means algorithm is proven to be effective for 
product grouping cases, this can encourage other research for different product grouping cases. 
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